Goto

Collaborating Authors

 causal regularization



Causal Regularization

Neural Information Processing Systems

We argue that regularizing terms in standard regression methods not only help against overfitting finite data, but sometimes also help in getting better causal models. We first consider a multi-dimensional variable linearly influencing a target variable with some multi-dimensional unobserved common cause, where the confounding effect can be decreased by keeping the penalizing term in Ridge and Lasso regression even in the population limit. The reason is a close analogy between overfitting and confounding observed for our toy model. In the case of overfitting, we can choose regularization constants via cross validation, but here we choose the regularization constant by first estimating the strength of confounding, which yielded reasonable results for simulated and real data. Further, we show a'causal generalization bound' which states (subject to our particular model of confounding) that the error made by interpreting any non-linear regression as causal model can be bounded from above whenever functions are taken from a not too rich class.


CausalDiffTab: Mixed-Type Causal-Aware Diffusion for Tabular Data Generation

Zhang, Jia-Chen, Zhou, Zheng, Xiong, Yu-Jie, Xia, Chun-Ming, Dai, Fei

arXiv.org Artificial Intelligence

Training data has been proven to be one of the most critical components in training generative AI. However, obtaining high-quality data remains challenging, with data privacy issues presenting a significant hurdle. To address the need for high-quality data. Synthesize data has emerged as a mainstream solution, demonstrating impressive performance in areas such as images, audio, and video. Generating mixed-type data, especially high-quality tabular data, still faces significant challenges. These primarily include its inherent heterogeneous data types, complex inter-variable relationships, and intricate column-wise distributions. In this paper, we introduce CausalDiffTab, a diffusion model-based generative model specifically designed to handle mixed tabular data containing both numerical and categorical features, while being more flexible in capturing complex interactions among variables. We further propose a hybrid adaptive causal regularization method based on the principle of Hierarchical Prior Fusion. This approach adaptively controls the weight of causal regularization, enhancing the model's performance without compromising its generative capabilities. Comprehensive experiments conducted on seven datasets demonstrate that CausalDiffTab outperforms baseline methods across all metrics. Our code is publicly available at: https://github.com/Godz-z/CausalDiffTab.


Reviews: Causal Regularization

Neural Information Processing Systems

Reasons for score: ---------------------- Lack of clarity regarding some of the main theoretical and empirical results (see detailed comments and improvements for details). Assuming the authors address these points of clarification, my main concern is that the analyses that the authors present does not provide a practical method that practitioners can use: if I am understanding correctly, the conclusion is regularization might somewhat reduce the effects of confounding. But the authors do not provide a way to do sensitivity analysis to check how much confounding still exists or what to do about it; or what assumptions are required for their method to completely identify the causal estimands. Detailed comments: ------------------ Regarding the theory: - Some of my confusion arises from the fact that I do not fully understand what the authors mean by a "mixing matrix" and \ell "sources". I assumed that it is a random matrix based on their experimental setup where is drawn from a gaussian distribution.


Reviews: Causal Regularization

Neural Information Processing Systems

This paper discusses the connection between regularization and causality, resting on the simple problem of linear regression, using ridge regression and Lasso as illustrative cases study for their argument. The paper provides original insights on the link between both regularization and causality. For the final version, it would be nice if the authors could introduce a bit more context on do-calculation (two lines stating that this is a pivotal tool from the framework of causality) and give more practical insights on the consequences of their results.


Causal Regularization

Neural Information Processing Systems

We argue that regularizing terms in standard regression methods not only help against overfitting finite data, but sometimes also help in getting better causal models. We first consider a multi-dimensional variable linearly influencing a target variable with some multi-dimensional unobserved common cause, where the confounding effect can be decreased by keeping the penalizing term in Ridge and Lasso regression even in the population limit. The reason is a close analogy between overfitting and confounding observed for our toy model. In the case of overfitting, we can choose regularization constants via cross validation, but here we choose the regularization constant by first estimating the strength of confounding, which yielded reasonable results for simulated and real data. Further, we show a'causal generalization bound' which states (subject to our particular model of confounding) that the error made by interpreting any non-linear regression as causal model can be bounded from above whenever functions are taken from a not too rich class.


Causal Regularization

Janzing, Dominik

Neural Information Processing Systems

We argue that regularizing terms in standard regression methods not only help against overfitting finite data, but sometimes also help in getting better causal models. We first consider a multi-dimensional variable linearly influencing a target variable with some multi-dimensional unobserved common cause, where the confounding effect can be decreased by keeping the penalizing term in Ridge and Lasso regression even in the population limit. The reason is a close analogy between overfitting and confounding observed for our toy model. In the case of overfitting, we can choose regularization constants via cross validation, but here we choose the regularization constant by first estimating the strength of confounding, which yielded reasonable results for simulated and real data. Further, we show a'causal generalization bound' which states (subject to our particular model of confounding) that the error made by interpreting any non-linear regression as causal model can be bounded from above whenever functions are taken from a not too rich class.